This talk will provide key insights on the learnings I have obtained throughout my career building & deploying machine learning systems in critical environments across several sectors. I will provide a deep dive on how to build scalable and distributed machine learning data pipelines using Airflow with a Celery backend. I will also compare Airflow with other technologies available out there and how it differentiates, such as Luigi, Chronos, Pinball, etc. If you attend the talk, you will obtain an understanding on the solid fundamentals of Airflow, together with its caveats and walk-arounds for more complex use-cases. As we proceed with the examples, I will cover the challenges that you will run into when scaling Machine Learning systems, and how Airflow can be used to address these using a manager-worker-queue architecture for distributed processing with Celery. By the end of this talk you will have the knowledge required to build your own industry-ready machine learning pipelines to process data at scale, and I will provide further reading resources so people are able to implement the knowledge obtained almost right away.
Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/